February 2004 Blog Posts

Raymond Chen slaps the wrists of developers who've ever used the GetDesktopWindow() call to populate the window handle parameter of some API call or other (who me?).

In reply to this:

IMHO viewstate is the second worst piece of the .NET Framework. The worst being "javascript:do_postback" instead of providing clean, lean and mean URLs. Oh wait, they are related -- the latter is the necesary workaround for the former.

Scott Hanselman writes:

I think it's actually damned clever, and quite possibly neccesary. The HTTP/HTML combination needed an eventing subsystem built on top of it. DoPostback() does just that with support for Event Targets and Event Arguments. It's simple, supported, and clean.

I agree. I've said it before and I'll say it again many times: the whole framework of nested controls within ASP.NET fits together in such a seemingly simple and logical way that it sometimes seems astounding that it took so long to get here. In fact, that is one of the facets of a well designed system. It takes effort to make something appear so obvious and it's a sure sign you've achieved something good in my opinion.

Some people disagree: It's not progress when developers who spend *years* on the platform learn more about how to hack it into submission than how it works in the first place.

To the naysayers who think spending hours on just getting a form to remember what was just typed in to it is fun, I say go back to ASP and revel in the fact that it is closer to raw HTTP. I like the new world order and I'll stick with ASP.NET. I like the fact that when someone asks for a data capture form it doesn't cause everyone to groan at the impending pain.

It seems to me that most of the people who don't like viewstate (and I'm not referring to anyone in particular) develop this dislike after having seen it implemented badly. If you're developing a scalable web application then it takes some amount of expertise. However, the world is also run by ragged Excel spreadsheets and hacked up VB applets written by people whose job is to just get stuff done. If ASP.NET also lets these people get their job done using a web server with an eventing model they're familiar with and it works for them then I don't have a problem with that.

Spirit fell silent, alone on the emptiness of Mars, trying and trying to reboot. And its human handlers at JPL seemed at a loss to help, unable to diagnose a system they could not see.

Fascinating! [via Early Adopter]

LauraJ: Exciting news here about the InfoPath 2003 SP-1 preview that you can download here.

Craig describes tracking back a nasty bug that turned out to be interop related and due to missing the [STAThread] attribute from his Main(). On the other hand, Eric Gunnerson wrote some time ago about how this would be missing from the boilerplate in console applications in Whidbey. Will this cause future problems?

Mike Taulty: ...when I went to University (here, starting around 15 years ago) we did some practical computing work but the emphasis was on computing theory. So, we did compilers, languages, databases and so on but we spent more time on computability, coding theory, linear algebra, automata, grammars, algorithms and complexity analysis and so on.

The thing that's always struck me about this is that during my career these things remain constant whilst the technologies around change on a constant basis - I wonder what happens "tomorrow" if you spend 70% of your time learning how to do today's technologies?

When I was at university (for the record here) one of the criticisms I had was the lack of practical teaching. It felt like few of the lecturers had real world experience and that they were churning out graduates with little more than pure mathematics degrees and little actual programming skill. I often wondered how most of the students would fare in a real software development role. I thought I gained more practical knowledge on my own time in the labs than I ever did in lectures.

Now, after more than 10 years in this industry, I find myself often referring back to topics covered during those lectures. Understanding that an algorithm is O(n^2) or knowing the theory underlying concurrency issues with threading or databases is something I now treat as common knowledge/sense. It isn't though - it's that theoretical background shining through. Today I'm ever grateful for the time spent looking at things that seemed somewhat pointless at the time.

Michael Platt: I have done a ton of interviewing of architects and typically look for this simulated annealing in their problem solving approach using a number of architectural problems that I have come across that cannot be solved by [divide and conquer] (in fact make the problem worse). One of my favourite non technical interview questions is very simple and designed to see how people analyse problems. Their response allows you to categorise them roughly into mathematical, logical or analytical thinkers. Developers tend to be logical, architects tend to be analytical, I not sure what the mathematicians are! ... Here's the question if you want to see what you are:

I have a cup of coffee and a cup of tea. I take a teaspoon full of coffee and put it in the tea. I then take a teaspoon of the tea and put it in the coffee. Which is the purest, the coffee or the tea? Explain your thinking.

Well, I went for the mathematical approach. Not sure what that means.

Jon Kale talks about how, even when work sucks, life can be good with VMware and a dose of Snapshot and Revert. I think my experiment with Virtual PC 2004 is over - I can't live without this.

Craig points to The Craftsman Series. I also read and enjoyed these articles after a recent recommendation.

Robert Hurlbut continues the thread about distributed computing with reference to data security. The thrust of the piece is about providing additional security in depth by physically distributing the data access tier of your application.

At a high level I sort of agree with this and my comments may really be a difference of terminology rather than something more fundamental.

First of all, I'd be reluctant to make the distribution break purely at the data access level. For me, the data access tier is all about dealing with the storage of entities. Each component deals with only one entity and as such each method only reads or writes to one entity type at a time. (By entity I typically mean the nouns in your system and these tend to tie to the main tables in a database - things like a person, product, or order.) The next layer up uses business rules to combine these entity operations together into meaningful business transactions (e.g. creating an order might create an order entity and add line item entities to it, etc.). I am more inclined to provide a distributed service using a business facade over these business rules and for that to be the security barrier. This helps to ensure that data integrity is maintained by the business rules and promotes reuse of the service in more robust manner. As I said, I'm not sure if this is what Robert means or whether we differ here.

Secondly, and Robert suggests this but I want to be more explicit, I think you should work hard to build a secure network boundary at these service points between the consuming applications and the underlying facade with its data store. In practical terms this often means some kind of firewall or filtering router/switch.

Robert talks in depth about the choice of communication channel for interacting with these services. My comment to this is really about the future: the message I took away from the PDC is that if you're looking towards Indigo you want to think web services (my preference) or COM+/ES. In other words, Remoting isn't the way forward.

Update: Robert clarifies that we do see eye to eye on much of this topic.

I've created a new version of my Outlook Attachment Security Unlock Applet that works with Outlook 2003.

Despite my initial scepticism, and what I consider to be a lack of direction from Microsoft about how best to utilise InfoPath, I've been bitten by the InfoPath bug after the past couple of months or so and we're using it as the central forms engine on one of the projects I'm currently involved with.

As I think everyone probably does, I did wonder about the lack of a runtime-only version of InfoPath. After all, there are viewers for Word, Excel, PowerPoint, and I think Visio. You can use the Jet database engine to read .MDB files. But if viewing is all you want, you can use the files InfoPath creates - they are using standards formats like XSLT and XSD. LauraJ has the definitive answer on this topic.

Google Toolbar

Google's graphics changing goes one step further...onto the desktop.

Scott Hanselman: I am a fan of XmlSerialization, and I'm a fan of anything that makes my job easier. We're using the hell out of XmlSerialization on a project I'm on.

Hear, hear!

Robert Hurlbut follows up Sam Gentile's post about distributed development. There's not much I can argue with here. In my response yesterday, I mentioned security as being one situation in which I support physical separation of the public presentation tier from the underlying business logic.

I am curious, however, about what a "well-defined pipe using COM+/ES server components" looks like from a security perspective. Doesn't this require DCOM, which, from memory, pretty much blew a hole in any firewall?

Tiago Pascoal agrees with me about horizontally scaling applications and shares the same experiences with respect to database bottlenecks. He also calls for Sam and Robert to share more insights about their architecture - this exchange of ideas and experience is an excellent way to learn more about this subject.

The February 2004 issue of Communications of the ACM includes an article entitled How clean is the future of SOAP? (sorry, you have to be a member or have a subscription to read the full article). The basic thrust of the piece is that if developers aren't careful with how they develop SOAP-based applications then security staff will close firewalls to SOAP over HTTP and web services will lose their primary advantage, that of being able to penetrate firewalls on port 80.

I thought we were past all this. I remember having a conversation with someone at Microsoft UK back in 2000 asking about this point and wondering whether it wouldn't be better to just pick a SOAP port instead of 80. I now realise that that was missing the point really and deploying web services as part of a web server has been one of the factors leading to their ongoing success (or promise depending upon your point of view).

The author of this document can be forgiven for not knowing that SOAP is no longer an acronym that formerly stood for Simple Object Access Protocol and is now just a name - it probably isn't widely known and is tucked away in the SOAP 1.2 recommendation. However, the link to HTTP was only the most popular mechanism from early on - I seem to remember demos with SMTP some time ago. More recently, most people au fait with the current state of play will see that SOAP is really about the message passing and that the transport is really orthogonal to that.

To claim that web services suddenly expose previously unavailable internal application behaviour to external users seems to ignore the state of the web today. There can't be many corporate web sites actively engaged in driving revenue that serve only static content these days. Most web sites contain web applications and expose internal application behaviour through a HTML and HTTP GET/POST interface (normally used with a browser). Exposing the same functionality with XML and HTTP GET/POST doesn't in and of itself make things any less secure.

Developers need to be concerned not only with the code they expose through web services but equally (and perhaps more subtly) with anything they expose through any kind of web server. Similarly, security professionals need to be far more deeply involved in understanding the business processes driving the use of web applications and web services in specific terms than simply considering shutting the firewall to SOAP over HTTP.

To be honest, I'd have expected more stringent peer review of an article published in such a prestigious journal.

Sam Gentile writes about how he feels that developers don't get the distributed paradigm when developing in .NET and that the documentation and literature doesn't help any.

I'm not sure I agree with his premise, however, that it is normally a good idea to distribute the tiers onto different hardware. In fact, my usual approach is to encourage multiple tiers to reside on the same front-end box in such a way that you can duplicate those boxes for redundancy and scaling using network load balancing on the front end.

This doesn't bring us back to client/server systems: n-tier architectures are logical software designs and not necessarily tied to physical deployment designs. Client/server was all about long lived connections to the database and that was what didn't scale. N-tier is all about get in quick, get what you want, and get out. It's also about separating presentation from business logic from data storage. None of that implies multiple hardware layers.

Sam claims that deploying the middle tier on separate boxes usually gives far better performance relative to hardware costs. I'm not sure that my experience supports that conclusion. It might be true for very high transaction systems but, on a smaller scale, communication costs between the layers can add significant latency.

In recent times, the occasions where I have supported the "middle tier" on separate hardware have been in what we're learning to call Service Oriented situations where the functionality has been exposed either with remoting or, more favourably, with a web service. In general, this has been for one of two reasons: security where it is possible to put an additional firewall between the front-end external presentation hardware and the underlying internal service (thanks to avoiding the blunderbuss DCOM approach to firewalls); and for deploying subsystems (such as search engines) where this key functionality might be extended, scaled, upgraded, or otherwise changed completely independent from the rest of the system.

What I have found to be typical in data-driven distributed systems is that scalability is more of a problem on the back-end database than it is on front-end web servers. It is relatively cheap in development cost, maintenance, and hardware cost to build a system where you can add an extra web server to the cluster if you need a bit more horsepower there. What is much more challenging is designing your application in a way that allows you to scale the database: while front-end hardware is cheap, scaling your database up gets increasingly expensive for decreasing returns and scaling out is something you really need to plan for up front and may require some compromises on things like referential integrity.

I'm not a big fan of distributed transaction through the DTC using COM+/ES in systems that often end up using only one resource manager at a time (say to a SQL Server or Oracle database). Back working with COM+/DNA/VB6 it made life much easier and was a price worth paying but with .NET and the managed data providers giving tighter access to the database I don't think that is always the case. This is the foundation underlying my declarative SQL transactions code and code generator where I support ES-like attributes but rely on SQL Server transactions. I recognise that for large systems where you've had to partition and scale out your data the DTC is necessary but I've worked on a fair number of transactional e-commerce systems that needed front-end scalability but hit against a single database server at the back.

Sam's starting point was looking at data access options with ADO.NET and he comments on Rob Howard's advice to use DataReaders unless you absolutely need DataSets. I'm in agreement with Sam in finding any reason not to use DataSets however I was firmly in the pass DataReaders camp as being the fastest way to get data to your output. Latterly I've had mixed feelings about this. It is true that you can't pass a DataReader across a boundary but exploding the data into object form only to repeat the process as you bind it to output controls in ASP.NET also seems like anathema and not entirely distant from the criticism of the bloated DataSet. In some cases I've compromised on separation and passed a DataReader up to the presentation layer and in cases where I know I need (or am likely to need) to cross a boundary, I've transferred the data into "behaviour-less" (state-only) objects. These are readily serialised for transmission by remoting or web service and you can still use databinding for presentation.

In conclusion I am saying that using data binding and DataReaders in the presentation layer doesn't necessarily mean that there isn't a careful separation of presentation and business logic and doesn't have to mean we're heading back down the road to monolithic client/server applications. The logical and physical architectures of distributed systems don't necessarily have to match.